Residuals. Scatterplots can be deceiving. The y-intercept. residual, e y yˆ

Size: px
Start display at page:

Download "Residuals. Scatterplots can be deceiving. The y-intercept. residual, e y yˆ"

Transcription

1 Learning Objectives At the end of this chapter, students will be able to: understand that sometimes there may be subsets in the data worth exploring separately. describe how unusual data points affect the regression model and the correlation coefficient. create a residual plot and look for patterns in the plot. We will look at: pattern changes in scatterplots; the dangers of extrapolation; the possible effects of outliers, high leverage, and influential points; the problem of regression of summary data; and the mistake of inferring causation. Issues and Problems with Regression Subsets and curves Dangers of extrapolation Possible effects of outliers, high leverage, and influential points Problems with regression of summary data Mistakes of inferring causation Recall Sometimes the y-intercept is not realistic / not possible. Here we have negative blood alcohol content, which makes no sense But the negative value is appropriate for the equation of the regression line. The y-intercept y-intercept shows negative blood alcohol Recall Residuals The distances from each point to the least-squares regression line give us potentially useful information about the contribution of individual data points to the overall pattern of scatter. Points above the line have a positive residual. These distances are called residuals. The sum of these residuals is always 0. Points below the line have a negative residual. There is a lot of scatter in the data and the line is just an estimate. ^ Predicted y Observed y residual, e y yˆ What else can residuals tell us? Because a linear regression model is not always appropriate for the data, you should assess the appropriateness of the model by defining residuals and examining residual plots. Histograms (and other graphs) of residuals can reveal Subsets of data that will enhance our understanding of the original data. May lead us to analyzing the subsets separately. Histogram of residuals Scatterplot of residuals Scatterplots can be deceiving When working with a linear model, we must have data that is linear. ALWAYS check the residual plot for a pattern when you are trying to verify linearity!

2 Scatterplots can be deceiving Read the Penguin Case and analyze the scatterplots, p invisible squiggle Getting the Bends The scatterplot of residuals against Duration of emperor penguin dives holds a surprise. The Linearity Assumption says we should not see a pattern, but instead there is a bend. Even though it means checking the Straight Enough Condition after you find the regression, it s always good to check your scatterplot of the residuals for bends that you might have overlooked in the original scatterplot. Sifting Residual from Groups It is a good idea to look at both a histogram of the residuals and a scatterplot of the residuals vs. predicted values in the regression predicting Calories from Sugar content in cereals: The small modes in the histogram are marked with different colors and symbols in the residual plot above. What do you see? Sifting Residual from Groups An examination of residuals often leads us to discover groups of observations that are different from the rest. When we discover that there is more than one group in a regression, we may decide to analyze the groups separately, using a different model for each group. Hard to see curves Sometimes the scatterplot looks straight enough, but a non-linear relationship only comes to light after you look at the residual plot.

3 Getting the Bends No regression analysis is complete without a display of the residuals to check that the linear model is reasonable. Because the residuals are what is left over after the model describes the relationship, they often reveal subtleties that were not clear from a plot of the original data. Sometimes the subtleties we see are additional details that help confirm or refine our understanding. Sometimes they reveal violations of regression conditions that require our attention. Getting the Bends Linear regression only works for linear models. (That sounds obvious, but when you fit a regression, you can t take it for granted.) A curved relationship between two variables might not be apparent when looking at a scatterplot alone, but will be more obvious in a plot of the residuals. Remember, we want to see nothing in a plot of the residuals. Subsets Here s an important unstated condition for fitting models: All the data must come from the same group. When we discover that there is more than one group in a regression, neither modeling the groups together nor modeling them apart is necessarily correct. You must determine what makes the most sense. In the following example, we see that modeling them apart makes sense. Subsets The figure shows regression lines fit to calories and sugar for each of the three cereal shelves in a supermarket: Caution: Extrapolation Ahead! When we make predictions with a regression line that are far outside the range of our explanatory (x) values, we are using extrapolation. In other words, extrapolation is the use of a regression line for prediction outside a known range of x values. Extrapolations are often not accurate Associations for variables can be trusted only for the range of values for which data have been collected (known as interpolation) Note: even a very strong relationship may not hold outside the data s range Caution: Beware of Extrapolation! Do the graphs make sense?!!!!!!

4 There is quite some variation in BAC for the same number of beers drunk. A person s blood volume is a factor in the equation that we have overlooked. Bacterial growth rate changes over time in closed cultures: I m drunk! Now we change the number of beers to the number of beers/weight of a person in pounds. Note how much smaller the variation is. An individual s weight was indeed influencing our response variable blood alcohol content. If you only observed bacterial growth in test tubes during a small subset of the time shown here, you could get almost any regression line imaginable. Extrapolation = big mistake Caution: Beware of Extrapolation! Sarah s height was plotted against her age Can you predict her height at age 42 months? Can you predict her height at age 30 years (360 months)? height (cm) 100 Age (months) Height (cm) age (months) Caution: Beware of Extrapolation! Regression line: y = x height at age 42 months? y = 88 cm. height at age 30 years? y = cm. She is predicted to be 6' 10.5" at age 30! height (cm) age (months) Making predictions: Interpolation The equation of the least-squares regression allows you to predict y for any x within the range studied. This is called interpolating. Predicting the Future: Extrapolation S ŷ x Nobody in the study drank 6.5 beers, but by finding the value of ŷ from the regression line for x = 6.5, we would expect a blood alcohol content of mg/ml. yˆ * yˆ mg/ ml Extrapolation is the use of a regression line for predictions outside the range of x-values used to obtain the line. Extrapolations can get us in trouble. When the x-variable is Time, extrapolation becomes an attempt to peer into the future. People have always wanted to see into the future

5 Predicting the Future: Extrapolation The model should only be trusted for the span of x-values it represents. Extrapolations assume that past trends will continue into the far future. An example of extrapolation in the news Women may run faster than men in 2156 Extrapolation The farther our x-value is from the mean of x, the less we trust our predicted value. Once we venture into new x territory, our predicted value is an extrapolation. Extrapolations are dubious because they require the additional and very questionable assumption that nothing about the relationship between x and y changes even at extreme values of x. Knowing that extrapolation is dangerous doesn t stop people. The temptation to see into the future is hard to resist. Here s some more realistic advice: If you must extrapolate into the future, at least don t believe that the prediction will come true. Extrapolation Here is a timeplot of the Energy Information Administration (EIA) predictions and actual prices of oil barrel prices. How did forecasters do? They seemed to have missed a sharp run-up in oil prices in the past few years. Outliers Data points that stand away from the others/ diverge in a big way from the overall pattern. Outlying points can strongly influence a regression. Even a single point far from the body of the data can dominate the analysis. Outliers can be extraordinary by having large residuals or by having high leverage. Outliers There are four ways that a data point might be considered an outlier. It could have an extreme x-value compared to other data points. It could have an extreme y-value compared to other data points. It could have extreme x and y values. It might be distant from the rest of the data, even without extreme x or y values.

6 Extreme x-value Extreme x and y Extreme y-value Distant data point Influential Point an outlier that greatly affects the slope of the regression line. One way to test the influence of an outlier is to compute the regression equation with and without the outlier. This type of analysis is illustrated below. The scatter plots are identical, except that the plot on the right includes an outlier. The slope is flatter when the outlier is present (-3.32 vs ), so this outlier would be considered an influential point. Influential Point The scatterplots below compare regression statistics for another data set with and without an outlier. Here, the chart on the right has a single outlier, located at the high end of the x-axis (where x = 24). As a result of that single outlier, the slope of the regression line changes greatly, from -2.5 to -1.6; so the outlier would be considered an influential point. Influential Point Sometimes, an influential point will cause the coefficient of determination to be bigger; sometimes, smaller. In the first example above, the coefficient of determination is smaller when the influential point is present (0.94 vs. 0.55). In the second example, it is bigger (0.46 vs. 0.52). Influential Point If your data set includes an influential point, here are some things to consider: An influential point may represent bad data, possibly the result of measurement error. If possible, check the validity of the data point. Compare the decisions that would be made based on regression equations defined with and without the influential point. If the equations lead to contrary decisions, use caution. Test your understanding of this lesson In the context of regression analysis, which of the following statements are true? I) When the data set includes an influential point, the data set is nonlinear. II) Influential points always reduce the coefficient of determination. III) All outliers are influential data points. (A) I only Data sets with influential points can be linear or (B) II only (C) III only in which an influential point increased the (D) All of the above coefficient of determination. With respect to nonlinear. In this lesson, we went over an example regression, outliers are influential only if they have (E) None of the above a big effect on the regression equation. Sometimes, outliers do not have big effects. For example, when the data set is very large, a single outlier may not have a big effect on the regression equation.

7 Influential Point A point is influential if omitting it from the analysis gives a very different model (changes the slope of the line) High leverage points can also be influential, but do not need to be Not all outliers and leverage points are influential Fit the regression line with and without to determine the influence Influential Point A leverage point that does goes against the overall pattern of the data is an influential point. However, points that are at the edges of our data s range of x-values that go against the overall pattern will also be influential. Outliers whose x-value lies in the center of the range of x-values will NOT be influential; they do not affect the linear model much. However, such points WILL weaken the correlation (r). Influential Point Influential points are points with high leverage. They highly influence the slope of the regression line and the correlation coefficient. Influential points can be more easily seen in scatterplots of the original data or by finding a regression model with and without the points. The surest way to verify that a point is influential is to find the regression line with and without the suspect point. If the line moves more than a small amount when the point is deleted, the point is influential (for the LSRL). Leverage Data points whose x-values are far from the mean of x are said to exert leverage on a linear model. Points that are extraordinary in their x-value can especially influence a regression model. We say that they have high leverage. Can have large effect on the line high leverage points pull the regression line close to them, sometimes completely determining the slope and intercept. With high enough leverage, their residuals can appear deceptively small. (p. 205) Leverage Leverage points can either confirm the pattern of data, or they may go against the pattern. If they confirm the pattern, the r value is increased by the leverage point s presence. In other words, if the range of x-values for which a pattern applies is widened/extended, the correlation will increase. Imagine a fulcrum (for a see-saw) in the scatterplot, located at the central x-values a leverage point is a point far out on the see-saw. Leverage points can have a big effect on r or on our linear model Just as a small person can have a big effect on a see-saw if they are seated far enough from the fulcrum. Types of Unusual Points 1) High Leverage points with small residuals These points confirm the pattern, but are extreme values. The slope and intercept are mostly unaffected, but the r 2 value will increase don t be misled that the model is now stronger.

8 Types of Unusual Points 2) Outliers Not high leverage, not influential and large residual Does not affect the slope, but aren t consistent with pattern. Will change the intercept. Don t throw away. x value is near center of mean of x-values. Types of Unusual Points 3) Influential Points also high leverage and probably residual These are most troublesome. They aren t consistent with model and if the point is removed the slope of line dramatically changes it changes the model. Don t throw it out without thinking. Types of Unusual Points Typically, a point that is an outlier in the x-direction will exert influence on the line. Points tug at the regression line in an attempt to make their residuals smaller. But the regression line pivots around the mean-mean point. Points close to that fulcrum (left to right) can't make their residuals much smaller, hence they do not change the slope of the line much. Points far away (in the x-direction) can exert a lot of leverage changes in the slope can make their residuals much smaller. The extraordinarily large shoe size gives the data point high leverage. Wherever the IQ is, the line will follow! When we investigate an unusual point, we often learn more about the situation than we could have learned from the model alone. You cannot simply delete unusual points from the data. You can, however, fit a model with and without these points as long as you examine and discuss the two regression models to understand how they differ.

9 Warning: Influential points can hide in plots of residuals. Points with high leverage pull the line close to them, so they often have small residuals. You ll see influential points more easily in scatterplots of the original data or by finding a regression model with and without the points. When a point with high leverage lines up with the rest of the data, it doesn t influence the slope but it does increase the R 2. Removing a point that is an x-outlier (high leverage) but not a model outlier can actually decrease your R 2. 1) Not high leverage, not influential, large residual 2) High leverage, not influential, small residual 3) High leverage, influential, not large residual Beware the Lurking Variable! A lurking variable: a variable that is not included in the study but still potentially affects the relationship among the variables in a study A lurking variable can falsely suggest a strong relationship between x and y or it can hide a relationship that is really there Sometimes the relationship between two variables is influenced by other variables that we did not measure or even think about With observational data, as opposed to designed experiments, there is not way to be sure that a lurking variable is not the cause of any apparent association. The lurking variable is some third variable (not the explanatory or predictor variable) that is driving both variables you have observed. A lurking variable is sometimes referred to as common response. It s a variable that drives two other variables, creating the impression of an association between them. For countries, pick any measure of technological modernity (# of TVs per capita) and life expectancy. You'll clearly see an association countries with fewer TVs have lower life expectancy. Such lurking variables as general economic well-being and standard of living probably explain both. We don't think that having a TV increases your lifespan. Beware the Lurking Variable! There's this guy who's going to clean the windows of a mental asylum. A patient follows him shouts to him "I gotta secret, I gotta secret...," he ignores the patient. Again the patient follows him, but he ignores his cries. By the time he's nearly finished the building, he's really curious about what the patients secret is, so he decides to ask the patient. The patient pulls a matchbox out of his pocket, opens it and puts it on a table. Out crawls this little spider. The patient says "spider go left," and the spider walks to it's left a bit. Then he says "spider go right," the spider walks to its right a little bit. Beware the Lurking Variable! He says "spider turn around, walk forward then go right," and sure enough the spider turns around, walks forward, and then goes right a bit. The window cleaner is amazed "Wow! He says, that's amazing!" No, that's not my secret, says the patient, watch. He picks up the spider in his hand and pulls all its legs off then puts it back on the table. "Spider go right," the spider doesn't move, "spider go left," the spider doesn't move, "Spider, turn around" again the spider doesn't move. "There!" he says, "that's my secret, if you pull all the spider s legs off they go deaf...

10 Lurking variables Describe the association. What is the lurking variable in these examples? How could you answer if you didn t know anything about the topic? Strong positive association between the number of firefighters at a fire site and the amount of damage a fire does Negative association between moderate amounts of wine drinking and death rates from heart disease in developed nations amountof wine drinking vs. death rates from heart disease in developed nations the number of firefighters at a fire site vs. the amount of damage a fire does How to spot the presence of the lurking variable? Because lurking variables are often unrecognized and unmeasured, detecting their effect is a challenge. Many lurking variables change systematically over time. Plot both the response variable and the regression residuals against the time order of the observations whenever possible. An understanding of the background of the data then allows you to guess what lurking variables might be present. Example: Discrimination in Medical Treatment? Studies show that men who complain of chest pain are more likely to get detailed tests and aggressive treatment such as bypass surgery than are women with similar complaints. Is this association between gender and treatment due to discrimination? Not necessarily. Men and women develop heart problems at different ages women are on the average between 10 and 15 years older than men. Aggressive treatments are more risky for older patients, so doctor s may hesitate to recommend them. Lurking variables the patients age and condition may explain the relationship between gender and doctors decisions. Example: TV and Life Expectancy Measure the number of television sets per person x and the average life expectancy y for the world s nations. There is a high correlation: nations with many TV sets have higher life expectancies. Could we lengthen the lives of people in Rwanda by shipping them TV sets? Example: TV and Life Expectancy No. Rich nations have more TV sets than poor nations. Rich nations also have longer life expectancies because they offer better nutrition, clean water, and better health care. Clearly, there is no cause and effect relationship between TV sets and length of life. A Last Lurking Variable Example A study showed that there was a strong correlation between the number of firefighters at a fire and the property damage that the fire causes. So maybe we should send less firefighters to fight fires? WRONG! If the fire is severe and/or already large, we should send more firefighters to fight the fire.

11 Causation vs. Association Some studies want to find the existence of causation. Examples of causation: Increased drinking of alcohol causes a decrease in coordination. Smoking and Lung Cancer. Examples of association: High SAT scores are associated with a high Freshman year GPA. Smoking and Lung Cancer. Correlation Does NOT Imply Causation! Even very strong correlations may not correspond to a real causal relationship. Evidence of Causation A properly conducted experiment establishes the connection Other considerations: A reasonable explanation for a cause and effect exists The connection happens in repeated trials The connection happens under varying conditions Potential confounding factors are ruled out Alleged cause precedes the effect in time Reasons Two Variables May Be Related (Correlated) Explanatory variable causes change in response variable Response variable causes change in explanatory variable Explanatory may have some cause, but is not the sole cause of changes in the response variable Confounding variables may exist Both variables may result from a common cause such as, both variables changing over time The correlation may be merely a coincidence Explanatory Causes Response Response Causes Explanatory Explanatory: pollen count from grasses Response: percentage of people suffering from allergy symptoms Explanatory: amount of food eaten Response: hunger level Explanatory: Hotel advertising dollars Response: percentage Occupancy rate Positive correlation? more advertising leads to increased occupancy rate? Actual correlation is negative: lower occupancy leads to more advertising

12 Explanatory is NOT Sole Contributor Explanatory: Consumption of barbecued foods Response: percentage Incidence of stomach cancer Barbecued foods are known to contain carcinogens, but other lifestyle choices may also contribute Association does not imply Causation An association between two variables x and y can reflect many types of relationship among x, y, and one or more lurking variables. An association between an explanatory variable (predictor) x and a response variable y, even if it is very strong, is not by itself good evidence that changes in x actually cause changes in y. How to show Causation? The only way to get absolutely conclusive evidence of cause and effect or that x causes changes in y is to do an experiment in which we change x in an environment which we completely control, this keeps lurking variables under control When experiments cannot be done, finding the explanation for an observed association is often difficult and can even be controversial z is the lurking variable (dashed line indicates association, arrow indicated causation) Even a very strong association between two variables is NOT by itself a good evidence that there is a cause-andeffect link between the variables. The main question of establishing causation: How can a direct causal link between x and y be established? Explaining Association: Direct Causation Cause-and-effect Examples: Amount of fertilizer and yield of corn Weight of a car and its MPG Dosage of a drug and the survival rate of the mice We already know x and y are associated. The arrow shows causation between the two variables, i.e., x causes y. Explaining Association: Direct Causation x = mom s adult height y = daughter s adult height Experiments have shown that the mom s adult height is an appropriate predictor for a daughter s adult height. We have association AND causation. x = mother s body mass index (BMI) y = daughter s body mass index (BMI) Body part is determined by heredity. (based on a study). Daughters inherit half their genes from their mothers. There is therefore a direct causal link between the BMI of mothers and daughters. CAUTION: Even when direct causation is present, it is rarely a complete explanation of an association between two variables.

13 Causation Does smoking cause cancer? Did chemical weapons exposure cause health problems in Gulf War vets? Will increasing the speed limit increase traffic fatalities? Will bringing storks into an area increase the birth rate? Will lowering the drinking age limit in California affect the university dorm drinking parties? High temperatures in the summer lead to higher electricity use (fans, air conditioning, etc) Causation Example: Brothers and sisters heights are highly correlated. However a tall brother doesn't cause a tall sister. A more likely cause: common genetics Even though there may no be a causal relationship between two variables, it can still be useful to predict a sister's height from her brother's height. Common Response Refers to the possibility that a change in a lurking variable is causing changes in both our explanatory variable and our response variable. Example: It has been observed that children with more cavities tend to have larger vocabularies. However it is hard to see how more cavities might lead to larger vocabularies (or vice versa). However in this case, both variables are associated with age. # cavities Vocab. size Age Explaining Association: Common Response Both x and y change in response to changes in z, the lurking variable There may not be direct causal link between x and y. The lurking variable distort the true relation between x and y. Lurking variables can create nonsense correlations! x and y show an association but it is really the lurking variable z doing the work. Explaining Association: Common Response x = a high school senior s SAT score y = the student s first-year college grade point average Students who are smart and who have learned a lot tend to have both high SAT scores and high college grades. The positive correlation is explained by this common response (lurking variable) to students ability and knowledge. Bright students would tend to do well on both. Explaining Association: Common Response x = monthly flow of money into stock mutual funds y = monthly rate of return for the stock market There is a strong positive correlation between how much money individuals add to mutual funds each month and how well the stock market does the same month. Is the new money driving the market up? The correlation may be explained in part by common response to underlying investor sentiment: when optimism reigns, individuals send money to funds and large institutions also invest more. The institutions would drive up prices even if individuals did nothing. In addition, what causation there is may operate in the other direction: when the market is doing well, individuals rush to add money to their mutual funds.

14 Explaining Association: Common Response x = Divorce among men y = Percent abusing alcohol Both variables change due to common cause Both may result from an unhappy marriage. Explaining Association: Common Response Both Variables are Changing Over Time Both divorces and suicides have increased dramatically since Are divorces causing suicides? Are suicides causing divorces??? The population has increased dramatically since 1900 (causing both to increase). Better to investigate: Has the rate of divorce or the rate of suicide changed over time? Explaining Association: Confounding Two variables (whether explanatory or lurking) are confounded when their effects on a response variable cannot be distinguished from each other. Again, x and y show an association, but in this case, we are unable to determine whether x is causing y or if z is causing y. Confounding Two variables are confounded when you can t tell which of them (or whether it s the combination) had an affect. Refers to the possibility that either the change in our explanatory variable is causing changes in the response variable OR that a change in a lurking variable is causing changes in the response variable. You might want to test a fertilizer on your lawn. Suppose you spread it on half the lawn to see if the grass will look better there. If you spread it on the sunny half, leaving the shady half unfertilized, you won t know whether the greener grass resulted from fertilizer or sunshine (or the two together). Explaining Association: Confounding x = whether a person regularly attends religious services y = how long the person lives Many studies have found that people who are active in their religion live longer than nonreligious people. But people who attend church or mosque or synagogue also take better care of themselves than non-attenders. They are less likely to smoke, more likely to exercise, and less likely to be overweight. The effects of these good habits are confounded with the direct effects of attending religious services. Explaining Association: Confounding x = Meditation y = Aging (measurable aging factor) General concern for one s well being may be confounded with decision to try meditation

15 Explaining Association: Confounding x = the number of years of education a worker has y = the worker s income It is likely that more education is a cause of higher income many many highly paid professions require advanced education. However, confounding is also present. People who have high ability and come from prosperous homes are more likely to get many years of education than people who are less able or poorer. Of course, people who start out able and rich are more likely to have high earnings even without much education. We can t say how much of the higher income of well- educated people is actually caused by their education. Confounding Example: Strength of molded parts x: time in mold y: strength of part In a study, higher strength was associated with longer mold times. Time The way the experiment was performed was to have all the samples at 10 seconds in the mold done first, then the samples at 20 seconds, then 30 seconds, and so on. They also saw a strong relationship between strength and the order done. The time in the mold and the order done were confounded. It ended up that the mold got warmer as more batches were done and higher temperature increases strength. Strength Temperature Confounding Suppose you want to compare laundry detergent A vs detergent B. You wash a bunch of loads using A and B. But you always put A in washer #1 and always put B in #2. Now you're confounded. You don t know if it's the detergent or the washing machine that made one load cleaner than the other. A store s special promotion may increase video rentals but the marketing folks cannot be sure that s what did it if the weather was particularly bad during the trial period. Bad weather may have kept people indoors and induced them to rent more videos anyway. Any actual effect of the special promotion is confounded by the weather. Common response When more kids eat ice cream, more kids drown. It s the warmer weather that's causing an increase in both. They are both responded to summer. Student who are smart and who have learned a lot tend to have both high SAT scores and high college grades. The positive correlation is explained by the common response to students ability and knowledge. There have been many studies showing a strong positive association between hours spent in religious activities (going to church, attending religious classes, praying, etc) and life expectancy. NOT CAUSATION. There is confounding on average, people who attend religious activities also take better care of themselves than non-church attendants. They are also less likely to smoke, more likely to exercise and less likely to be overweight. These effects of good habits (lurking variables) are confounded with the direct effects of attending religious activities. Measure the number of television sets per person x and the average life expectancy y for the world s nations. There is a high positive correlation: nations with many TV sets have higher life expectancies. Could we lengthen the lives of people in Rwanda by shipping them TV sets? The scatterplot shows that the average life expectancy for a country is related to the number of televisions per person in that country.

16 Since televisions are cheaper than doctors, send TVs to countries with low life expectancies in order to extend lifetimes. Right? How about considering a lurking variable? That makes more sense Countries with higher standards of living have both longer life expectancies and more doctors (and TVs!). If higher living standards cause changes in these other variables, improving living standards might be expected to prolong lives and increase the numbers of doctors and TVs. The variables x and y have a common response variable z, per capita income. While, nations with higher per capita income have more TV sets than do poor nations, they also tend to have better nutrition, cleaner water, and better health care. Does smoking cause lung cancer? In order to know that smoking causes cancer, we would have to design an experiment where we can change the explanatory variable (smoking or not) in a controlled environment. Can we ethically make people smoke, drink, do illicit drugs, etc.? Are there other types of cause and effect relationships similar to this scenario? Does smoking cause lung cancer? Proving smoking causes lung cancer Association between smoking and lung cancer is very strong This association is consistent in many studies Many studies of different kinds of people in many countries link smoking to lung cancer. That reduces the chance that a lurking variable specific to one group or one study explains the association. High doses are associated with stronger responses That is, people who smoke more often tend to get lung cancer more often Does smoking cause lung cancer? Proving smoking causes lung cancer The alleged cause precedes the effect in time Lung cancer develops after years of smoking. It kills more men than any other form of cancer. Lung cancer was rare among women until women began to smoke. The alleged cause is plausible Experiments with animals show that tars from cigarette smoke do cause cancer. Still, in some cases, a person might get lung cancer from pollution, working in a factory, etc. Does smoking cause lung cancer? Causation: smoking causes lung cancer. Common response: people who have a genetic predisposition to lung cancer also have a genetic predisposition to smoking. Confounding: people who drink too much, don't exercise, eat unhealthy foods, etc. are more likely to get lung cancer as a result of their lifestyle. Such people may be more likely to be smokers as well. Car weight and gas mileage Causation: Physics says the more weight, the more energy you need to move an object, therefore implying worse gas mileage. Common response: The type of car (van, sports, SUV, etc) influences the weight, plus other factors that affect the gas mileage of the car. Confounding: While weight has a causative effect, its actual effect can not be accurately ascertained since weight is confounded with a number of factors, such as engine size or horsepower.

17 A More examples of Causation errors a) The number of firefighters at a fire vs. the amount of damage in dollars b) The number of hours students work vs. their grades c) The number of hours of TV watched by young children and the length of their attention span Establishing Causation The only compelling method: Designed experiment (Chapters 12-13) Hot disputes: Does gun control reduce violent crime? Does living near power lines cause cancer? Does smoking cause lung cancer? Does the use of fossil fuel causing global warming? Beware of Correlations based on Averaged Data! Many regression and correlation studies work with averages or other measures that combine information from many individuals Note when researchers use such techniques. Resist the temptation to apply the results of such studies to individuals. Correlations based on averages are usually too high when applied to individuals. Note exactly what variables were measured in a statistical study. Working with Summary Values Be cautious when working with data values that are summaries, such as mean and medians. These values have less variability and therefore inflate the strength of the relationship (correlation).

18 Working With Summary Values Scatterplots of statistics summarized over groups tend to show less variability than we would see if we measured the same variable on individuals. This is because the summary statistics themselves vary less than the data on the individuals do. Working With Summary Values Scatterplots of statistics summarized over groups tend to show LESS variability than we would see if we measured the same variable on individuals. For example, consider plotting the height and weights of students in a certain grade. Then imagine that instead of plotting a point for each individual student, we find the average weight for each height and plot the heights vs. the average weights. This second scatterplot is likely to have a lot less scatter and a higher R 2 value. Why? Because means vary less than individual values do. Sort of similar to what we saw with Simpson s paradox; sometimes lumping data together (in this case by using summary stats) we LOSE information. Working With Summary Values (cont.) There is a strong, positive, linear association between weight (in pounds) and height (in inches) for men: Working With Summary Values (cont.) If instead of data on individuals we only had the mean weight for each height value, we would see an even stronger association: Working With Summary Values (cont.) Means vary less than individual values. Scatterplots of summary statistics show less scatter than the baseline data on individuals. This can give a false impression of how well a line summarizes the data. There is no simple correction for this phenomenon. Once we have summary data, there s no simple way to get the original values back. Collection 2 Scatter Plot MeanMid Summary Data MeanMid Meanfin = 0.808MeanMid ; r 2 = 0.98

19 Collection All Data a Points Scatter Plot Mid Mid Fin = Mid + 72; r 2 = When you make a linear model: Make sure the relationship is straight. Beware of extrapolating. Beware of especially extrapolating into the future. Be on guard for different groups in your regression. Look for outliers. Beware of high leverage points, and especially of those that are influential. Consider comparing two regressions. Treat outliers honestly. Beware of lurking variables. Watch out when dealing with data that are summaries.

Section The Question of Causation

Section The Question of Causation Section 2.5 - The Question of Causation Statistics 104 Autumn 2004 Copyright c 2004 by Mark E. Irwin Causation Does smoking cause cancer? Did chemical weapons exposure cause health problems in Gulf War

More information

3.4 What are some cautions in analyzing association?

3.4 What are some cautions in analyzing association? 3.4 What are some cautions in analyzing association? Objectives Extrapolation Outliers and Influential Observations Correlation does not imply causation Lurking variables and confounding Simpson s Paradox

More information

Chapter 3: Examining Relationships

Chapter 3: Examining Relationships Name Date Per Key Vocabulary: response variable explanatory variable independent variable dependent variable scatterplot positive association negative association linear correlation r-value regression

More information

Causation. Victor I. Piercey. October 28, 2009

Causation. Victor I. Piercey. October 28, 2009 October 28, 2009 What does a high correlation mean? If you have high correlation, can you necessarily infer causation? What issues can arise? What does a high correlation mean? If you have high correlation,

More information

Section 3.2 Least-Squares Regression

Section 3.2 Least-Squares Regression Section 3.2 Least-Squares Regression Linear relationships between two quantitative variables are pretty common and easy to understand. Correlation measures the direction and strength of these relationships.

More information

3.2 Least- Squares Regression

3.2 Least- Squares Regression 3.2 Least- Squares Regression Linear (straight- line) relationships between two quantitative variables are pretty common and easy to understand. Correlation measures the direction and strength of these

More information

STAT 201 Chapter 3. Association and Regression

STAT 201 Chapter 3. Association and Regression STAT 201 Chapter 3 Association and Regression 1 Association of Variables Two Categorical Variables Response Variable (dependent variable): the outcome variable whose variation is being studied Explanatory

More information

Further Mathematics 2018 CORE: Data analysis Chapter 3 Investigating associations between two variables

Further Mathematics 2018 CORE: Data analysis Chapter 3 Investigating associations between two variables Chapter 3: Investigating associations between two variables Further Mathematics 2018 CORE: Data analysis Chapter 3 Investigating associations between two variables Extract from Study Design Key knowledge

More information

Chapter 4: More about Relationships between Two-Variables Review Sheet

Chapter 4: More about Relationships between Two-Variables Review Sheet Review Sheet 4. Which of the following is true? A) log(ab) = log A log B. D) log(a/b) = log A log B. B) log(a + B) = log A + log B. C) log A B = log A log B. 5. Suppose we measure a response variable Y

More information

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph. STAT 280 Sample Test Problems Page 1 of 1 1. An English survey of 3000 medical records showed that smokers are more inclined to get depressed than non-smokers. Does this imply that smoking causes depression?

More information

M 140 Test 1 A Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 60

M 140 Test 1 A Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 60 M 140 Test 1 A Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points 1-10 10 11 3 12 4 13 3 14 10 15 14 16 10 17 7 18 4 19 4 Total 60 Multiple choice questions (1 point each) For questions

More information

Chapter 3: Describing Relationships

Chapter 3: Describing Relationships Chapter 3: Describing Relationships Objectives: Students will: Construct and interpret a scatterplot for a set of bivariate data. Compute and interpret the correlation, r, between two variables. Demonstrate

More information

STATISTICS 8 CHAPTERS 1 TO 6, SAMPLE MULTIPLE CHOICE QUESTIONS

STATISTICS 8 CHAPTERS 1 TO 6, SAMPLE MULTIPLE CHOICE QUESTIONS STATISTICS 8 CHAPTERS 1 TO 6, SAMPLE MULTIPLE CHOICE QUESTIONS Circle the best answer. This scenario applies to Questions 1 and 2: A study was done to compare the lung capacity of coal miners to the lung

More information

AP Statistics Practice Test Ch. 3 and Previous

AP Statistics Practice Test Ch. 3 and Previous AP Statistics Practice Test Ch. 3 and Previous Name Date Use the following to answer questions 1 and 2: A researcher measures the height (in feet) and volume of usable lumber (in cubic feet) of 32 cherry

More information

4.2 Cautions about Correlation and Regression

4.2 Cautions about Correlation and Regression 4.2 Cautions about Correlation and Regression Two statisticians were traveling in an airplane from Los Angeles to New York City. About an hour into the flight, the pilot announced that although they had

More information

Regression Equation. November 29, S10.3_3 Regression. Key Concept. Chapter 10 Correlation and Regression. Definitions

Regression Equation. November 29, S10.3_3 Regression. Key Concept. Chapter 10 Correlation and Regression. Definitions MAT 155 Statistical Analysis Dr. Claude Moore Cape Fear Community College Chapter 10 Correlation and Regression 10 1 Review and Preview 10 2 Correlation 10 3 Regression 10 4 Variation and Prediction Intervals

More information

Administrative Information

Administrative Information Administrative Information Lectures: Tue/Thu 2:20-3:40 One lecture or Two? Location: room 2311 OR room 1441 (if seminars are being held in 2311) Instructor: Prof. M. Alex O. Vasilescu Office Hours: Tue

More information

Chapter 7: Descriptive Statistics

Chapter 7: Descriptive Statistics Chapter Overview Chapter 7 provides an introduction to basic strategies for describing groups statistically. Statistical concepts around normal distributions are discussed. The statistical procedures of

More information

CHAPTER ONE CORRELATION

CHAPTER ONE CORRELATION CHAPTER ONE CORRELATION 1.0 Introduction The first chapter focuses on the nature of statistical data of correlation. The aim of the series of exercises is to ensure the students are able to use SPSS to

More information

Eating and Sleeping Habits of Different Countries

Eating and Sleeping Habits of Different Countries 9.2 Analyzing Scatter Plots Now that we know how to draw scatter plots, we need to know how to interpret them. A scatter plot graph can give us lots of important information about how data sets are related

More information

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review Results & Statistics: Description and Correlation The description and presentation of results involves a number of topics. These include scales of measurement, descriptive statistics used to summarize

More information

Homework Linear Regression Problems should be worked out in your notebook

Homework Linear Regression Problems should be worked out in your notebook Homework Linear Regression Problems should be worked out in your notebook 1. Following are the mean heights of Kalama children: Age (months) 18 19 20 21 22 23 24 25 26 27 28 29 Height (cm) 76.1 77.0 78.1

More information

STATISTICS INFORMED DECISIONS USING DATA

STATISTICS INFORMED DECISIONS USING DATA STATISTICS INFORMED DECISIONS USING DATA Fifth Edition Chapter 4 Describing the Relation between Two Variables 4.1 Scatter Diagrams and Correlation Learning Objectives 1. Draw and interpret scatter diagrams

More information

Unit 3 Lesson 2 Investigation 4

Unit 3 Lesson 2 Investigation 4 Name: Investigation 4 ssociation and Causation Reports in the media often suggest that research has found a cause-and-effect relationship between two variables. For example, a newspaper article listed

More information

BIVARIATE DATA ANALYSIS

BIVARIATE DATA ANALYSIS BIVARIATE DATA ANALYSIS Sometimes, statistical studies are done where data is collected on two variables instead of one in order to establish whether there is a relationship between the two variables.

More information

Lecture 12: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

Lecture 12: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression Lecture 12: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression Equation of Regression Line; Residuals Effect of Explanatory/Response Roles Unusual Observations Sample

More information

M 140 Test 1 A Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

M 140 Test 1 A Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75 M 140 est 1 A Name (1 point) SHOW YOUR WORK FOR FULL CREDI! Problem Max. Points Your Points 1-10 10 11 10 12 3 13 4 14 18 15 8 16 7 17 14 otal 75 Multiple choice questions (1 point each) For questions

More information

Chapter 4: More about Relationships between Two-Variables

Chapter 4: More about Relationships between Two-Variables 1. Which of the following scatterplots corresponds to a monotonic decreasing function f(t)? A) B) C) D) G Chapter 4: More about Relationships between Two-Variables E) 2. Which of the following transformations

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

Relationships. Between Measurements Variables. Chapter 10. Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Relationships. Between Measurements Variables. Chapter 10. Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc. Relationships Chapter 10 Between Measurements Variables Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc. Thought topics Price of diamonds against weight Male vs female age for dating Animals

More information

Lecture 6B: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression

Lecture 6B: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression Lecture 6B: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression! Equation of Regression Line; Residuals! Effect of Explanatory/Response Roles! Unusual Observations! Sample

More information

Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1: Exploring Data Key Vocabulary:! individual! variable! frequency table! relative frequency table! distribution! pie chart! bar graph! two-way table! marginal distributions! conditional distributions!

More information

Homework #3. SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

Homework #3. SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question. Homework #3 Name Due Due on on February Tuesday, Due on February 17th, Sept Friday 28th 17th, Friday SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question. Fill

More information

Chapter 3 Review. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Chapter 3 Review. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question. Name: Class: Date: Chapter 3 Review Multiple Choice Identify the choice that best completes the statement or answers the question. Scenario 3-1 The height (in feet) and volume (in cubic feet) of usable

More information

Welcome to OSA Training Statistics Part II

Welcome to OSA Training Statistics Part II Welcome to OSA Training Statistics Part II Course Summary Using data about a population to draw graphs Frequency distribution and variability within populations Bell Curves: What are they and where do

More information

3.2A Least-Squares Regression

3.2A Least-Squares Regression 3.2A Least-Squares Regression Linear (straight-line) relationships between two quantitative variables are pretty common and easy to understand. Our instinct when looking at a scatterplot of data is to

More information

The Human Side of Science: I ll Take That Bet! Balancing Risk and Benefit. Uncertainty, Risk and Probability: Fundamental Definitions and Concepts

The Human Side of Science: I ll Take That Bet! Balancing Risk and Benefit. Uncertainty, Risk and Probability: Fundamental Definitions and Concepts The Human Side of Science: I ll Take That Bet! Balancing Risk and Benefit Uncertainty, Risk and Probability: Fundamental Definitions and Concepts What Is Uncertainty? A state of having limited knowledge

More information

Chapter 3 CORRELATION AND REGRESSION

Chapter 3 CORRELATION AND REGRESSION CORRELATION AND REGRESSION TOPIC SLIDE Linear Regression Defined 2 Regression Equation 3 The Slope or b 4 The Y-Intercept or a 5 What Value of the Y-Variable Should be Predicted When r = 0? 7 The Regression

More information

5 To Invest or not to Invest? That is the Question.

5 To Invest or not to Invest? That is the Question. 5 To Invest or not to Invest? That is the Question. Before starting this lab, you should be familiar with these terms: response y (or dependent) and explanatory x (or independent) variables; slope and

More information

Reminders/Comments. Thanks for the quick feedback I ll try to put HW up on Saturday and I ll you

Reminders/Comments. Thanks for the quick feedback I ll try to put HW up on Saturday and I ll  you Reminders/Comments Thanks for the quick feedback I ll try to put HW up on Saturday and I ll email you Final project will be assigned in the last week of class You ll have that week to do it Participation

More information

Chapter 11. Experimental Design: One-Way Independent Samples Design

Chapter 11. Experimental Design: One-Way Independent Samples Design 11-1 Chapter 11. Experimental Design: One-Way Independent Samples Design Advantages and Limitations Comparing Two Groups Comparing t Test to ANOVA Independent Samples t Test Independent Samples ANOVA Comparing

More information

Chapter Eight: Multivariate Analysis

Chapter Eight: Multivariate Analysis Chapter Eight: Multivariate Analysis Up until now, we have covered univariate ( one variable ) analysis and bivariate ( two variables ) analysis. We can also measure the simultaneous effects of two or

More information

Teaching Family and Friends in Your Community

Teaching Family and Friends in Your Community 2 CHAPTER Teaching Family and Friends in Your Community 9 Old people can remember when there were fewer problems with teeth and gums. Children s teeth were stronger and adults kept their teeth longer.

More information

CCM6+7+ Unit 12 Data Collection and Analysis

CCM6+7+ Unit 12 Data Collection and Analysis Page 1 CCM6+7+ Unit 12 Packet: Statistics and Data Analysis CCM6+7+ Unit 12 Data Collection and Analysis Big Ideas Page(s) What is data/statistics? 2-4 Measures of Reliability and Variability: Sampling,

More information

10. Introduction to Multivariate Relationships

10. Introduction to Multivariate Relationships 10. Introduction to Multivariate Relationships Bivariate analyses are informative, but we usually need to take into account many variables. Many explanatory variables have an influence on any particular

More information

Chapter 4. More On Bivariate Data. More on Bivariate Data: 4.1: Transforming Relationships 4.2: Cautions about Correlation

Chapter 4. More On Bivariate Data. More on Bivariate Data: 4.1: Transforming Relationships 4.2: Cautions about Correlation Chapter 4 More On Bivariate Data Chapter 3 discussed methods for describing and summarizing bivariate data. However, the focus was on linear relationships. In this chapter, we are introduced to methods

More information

Why we get hungry: Module 1, Part 1: Full report

Why we get hungry: Module 1, Part 1: Full report Why we get hungry: Module 1, Part 1: Full report Print PDF Does Anyone Understand Hunger? Hunger is not simply a signal that your stomach is out of food. It s not simply a time when your body can switch

More information

UNIT II: RESEARCH METHODS

UNIT II: RESEARCH METHODS THINKING CRITICALLY WITH PSYCHOLOGICAL SCIENCE UNIT II: RESEARCH METHODS Module 4: The Need for Psychological Science Module 5: Scientific Method and Description Module 6: Correlation and Experimentation

More information

Regression. Regression lines CHAPTER 5

Regression. Regression lines CHAPTER 5 CHAPTER 5 NASA/GSFC Can scientists predict in advance how many hurricanes the coming season will bring? Exercise 5.44 has some data. Regression IN THIS CHAPTER WE COVER... Linear (straight-line) relationships

More information

Chapter Eight: Multivariate Analysis

Chapter Eight: Multivariate Analysis Chapter Eight: Multivariate Analysis Up until now, we have covered univariate ( one variable ) analysis and bivariate ( two variables ) analysis. We can also measure the simultaneous effects of two or

More information

The Logic of Causal Order Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 15, 2015

The Logic of Causal Order Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 15, 2015 The Logic of Causal Order Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 15, 2015 [NOTE: Toolbook files will be used when presenting this material] First,

More information

Homework 2 Math 11, UCSD, Winter 2018 Due on Tuesday, 23rd January

Homework 2 Math 11, UCSD, Winter 2018 Due on Tuesday, 23rd January PID: Last Name, First Name: Section: Approximate time spent to complete this assignment: hour(s) Readings: Chapters 7, 8 and 9. Homework 2 Math 11, UCSD, Winter 2018 Due on Tuesday, 23rd January Exercise

More information

Lab 4 (M13) Objective: This lab will give you more practice exploring the shape of data, and in particular in breaking the data into two groups.

Lab 4 (M13) Objective: This lab will give you more practice exploring the shape of data, and in particular in breaking the data into two groups. Lab 4 (M13) Objective: This lab will give you more practice exploring the shape of data, and in particular in breaking the data into two groups. Activity 1 Examining Data From Class Background Download

More information

05/26/2011 Page 1 of 15

05/26/2011 Page 1 of 15 Number of IYS 2010 Respondents N Total Grade 198 203 401 Avg Age N Avg How old are you? 11.9 198 13.9 203 Gender % N % N Female 4 96 5 115 Male 5 99 4 87 Race/Ethnicity N % N % N White 8 165 8 176 Black

More information

05/26/2011 Page 1 of 15

05/26/2011 Page 1 of 15 Number of IYS 2010 Respondents N Total Grade 101 102 203 Avg Age N Avg How old are you? 11.8 101 13.7 102 Gender % N % N Female 4 43 5 52 Male 5 57 4 50 Race/Ethnicity N % N % N White 9 97 9 99 Black /

More information

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Biostatistics and Design of Experiments Prof Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Lecture 02 Experimental Design Strategy Welcome back to the course on Biostatistics

More information

ESL Health Unit Unit Four Healthy Aging Lesson Two Exercise

ESL Health Unit Unit Four Healthy Aging Lesson Two Exercise ESL Health Unit Unit Four Healthy Aging Lesson Two Exercise Reading and Writing Practice Advanced Beginning Checklist for Learning: Below are some of the goals of this lesson. Which ones are your goals

More information

Examining Relationships Least-squares regression. Sections 2.3

Examining Relationships Least-squares regression. Sections 2.3 Examining Relationships Least-squares regression Sections 2.3 The regression line A regression line describes a one-way linear relationship between variables. An explanatory variable, x, explains variability

More information

Part 1. For each of the following questions fill-in the blanks. Each question is worth 2 points.

Part 1. For each of the following questions fill-in the blanks. Each question is worth 2 points. Part 1. For each of the following questions fill-in the blanks. Each question is worth 2 points. 1. The bell-shaped frequency curve is so common that if a population has this shape, the measurements are

More information

Non-fiction: Attacking Asthma. For kids with asthma, the air they breathe makes a difference.

Non-fiction: Attacking Asthma. For kids with asthma, the air they breathe makes a difference. Attacking Asthma By Sandra J. Jordan For kids with asthma, the air they breathe makes a difference. Warm, stuffy, or dusty rooms are chancy for Alex D. of Fairview Heights, Ill. That s because hot air,

More information

STATISTICS 201. Survey: Provide this Info. How familiar are you with these? Survey, continued IMPORTANT NOTE. Regression and ANOVA 9/29/2013

STATISTICS 201. Survey: Provide this Info. How familiar are you with these? Survey, continued IMPORTANT NOTE. Regression and ANOVA 9/29/2013 STATISTICS 201 Survey: Provide this Info Outline for today: Go over syllabus Provide requested information on survey (handed out in class) Brief introduction and hands-on activity Name Major/Program Year

More information

A teaching presentation to help general psychology students overcome the common misconception that correlation equals causation

A teaching presentation to help general psychology students overcome the common misconception that correlation equals causation A teaching presentation to help general psychology students overcome the common misconception that correlation equals causation 1 Original A teaching presentation to help general psychology students overcome

More information

t-test for r Copyright 2000 Tom Malloy. All rights reserved

t-test for r Copyright 2000 Tom Malloy. All rights reserved t-test for r Copyright 2000 Tom Malloy. All rights reserved This is the text of the in-class lecture which accompanied the Authorware visual graphics on this topic. You may print this text out and use

More information

*Karle Laska s Sections: There is NO class Thursday or Friday! Have a great Valentine s Day weekend!

*Karle Laska s Sections: There is NO class Thursday or Friday! Have a great Valentine s Day weekend! STATISTICS 100 EXAM 1 Spring 2016 PRINT NAME (Last name) (First name) NETID: CIRCLE SECTION: L1 (Laska MWF 12pm) L2 (Laska Tues/Thurs 11am) Write answers in appropriate blanks. When no blanks are provided

More information

Correlation Ex.: Ex.: Causation: Ex.: Ex.: Ex.: Ex.: Randomized trials Treatment group Control group

Correlation Ex.: Ex.: Causation: Ex.: Ex.: Ex.: Ex.: Randomized trials Treatment group Control group Ch. 3 1 Public economists use empirical tools to test theory and estimate policy effects. o Does the demand for illicit drugs respond to price changes (what is the elasticity)? o Do reduced welfare benefits

More information

Living My Best Life. Today, after more than 30 years of struggling just to survive, Lynn is in a very different space.

Living My Best Life. Today, after more than 30 years of struggling just to survive, Lynn is in a very different space. Living My Best Life Lynn Allen-Johnson s world turned upside down when she was 16. That s when her father and best friend died of Hodgkin s disease leaving behind her mom and six kids. Lynn s family was

More information

Observational Studies and Experiments. Observational Studies

Observational Studies and Experiments. Observational Studies Section 1 3: Observational Studies and Experiments Data is the basis for everything we do in statistics. Every method we use in this course starts with the collection of data. Observational Studies and

More information

Lecture 12 Cautions in Analyzing Associations

Lecture 12 Cautions in Analyzing Associations Lecture 12 Cautions in Analyzing Associations MA 217 - Stephen Sawin Fairfield University August 8, 2017 Cautions in Linear Regression Three things to be careful when doing linear regression we have already

More information

05/27/2011 Page 1 of 15

05/27/2011 Page 1 of 15 Number of IYS 2010 Respondents N Total Grade 218 194 412 Age Avg N Avg How old are you? 11.9 218 13.8 193 Gender % N % N Female 5 112 5 103 Male 4 99 4 88 Race/Ethnicity N % N % N White 7 164 8 158 Black

More information

11/04/2011 Page 1 of 16

11/04/2011 Page 1 of 16 Survey Validity % N Invalid 5 Valid 96% 116 Valid surveys are those that have 4 or more of the questions answered, report no derbisol use, and indicate that the respondent was honest at least some of the

More information

11/03/2011 Page 1 of 16

11/03/2011 Page 1 of 16 Survey Validity % N Invalid 5 Valid 9 181 Valid surveys are those that have 4 or more of the questions answered, report no derbisol use, and indicate that the respondent was honest at least some of the

More information

Chapter 4: Scatterplots and Correlation

Chapter 4: Scatterplots and Correlation Chapter 4: Scatterplots and Correlation http://www.yorku.ca/nuri/econ2500/bps6e/ch4-links.pdf Correlation text exr 4.10 pg 108 Ch4-image Ch4 exercises: 4.1, 4.29, 4.39 Most interesting statistical data

More information

CHILD HEALTH AND DEVELOPMENT STUDY

CHILD HEALTH AND DEVELOPMENT STUDY CHILD HEALTH AND DEVELOPMENT STUDY 9. Diagnostics In this section various diagnostic tools will be used to evaluate the adequacy of the regression model with the five independent variables developed in

More information

Adult Asthma My Days of Living in Tension with Asthma are Over!

Adult Asthma My Days of Living in Tension with Asthma are Over! Published on: 9 Jul 2014 Adult Asthma My Days of Living in Tension with Asthma are Over! Introduction This is a recent picture, taken when we went on a family picnic. We climbed up this big hill and I

More information

Felden-WHAT? By Lawrence Wm. Goldfarb , All Rights Reserved

Felden-WHAT? By Lawrence Wm. Goldfarb , All Rights Reserved Felden-WHAT? By Lawrence Wm. Goldfarb 1993-94, All Rights Reserved It was about to happen; that moment, that dreaded moment. I was at my friend Marcello's birthday party, enjoying the Brazilian music when

More information

TRANSCRIPT: WHO S IN CONTROL? The carriage is empty, except for three people sitting at a table. All are smartly dressed.

TRANSCRIPT: WHO S IN CONTROL? The carriage is empty, except for three people sitting at a table. All are smartly dressed. TRANSCRIPT: WHO S IN CONTROL? An express train thunders past. INSIDE The carriage is empty, except for three people sitting at a table. All are smartly dressed. So what will you be saying at the conference?

More information

about Eat Stop Eat is that there is the equivalent of two days a week where you don t have to worry about what you eat.

about Eat Stop Eat is that there is the equivalent of two days a week where you don t have to worry about what you eat. Brad Pilon 1 2 3 ! For many people, the best thing about Eat Stop Eat is that there is the equivalent of two days a week where you don t have to worry about what you eat.! However, this still means there

More information

This means that the explanatory variable accounts for or predicts changes in the response variable.

This means that the explanatory variable accounts for or predicts changes in the response variable. Lecture Notes & Examples 3.1 Section 3.1 Scatterplots and Correlation (pp. 143-163) Most statistical studies examine data on more than one variable. We will continue to use tools we have already learned

More information

Lesson 1: Distributions and Their Shapes

Lesson 1: Distributions and Their Shapes Lesson 1 Name Date Lesson 1: Distributions and Their Shapes 1. Sam said that a typical flight delay for the sixty BigAir flights was approximately one hour. Do you agree? Why or why not? 2. Sam said that

More information

Chemotherapy Resistance: The Fault in Our Cells

Chemotherapy Resistance: The Fault in Our Cells Chemotherapy Resistance: The Fault in Our Cells [MUSIC PLAYING] JOHN F. KENNEDY: We choose to go to the moon. We choose to go to the moon in this decade and do the other things, not because they are easy,

More information

Making Sense of Measures of Center

Making Sense of Measures of Center Making Sense of Measures of Center Statistics are numbers that are part of your everyday world. They are used in reporting on baseball, basketball, football, soccer, the Olympics, and other sports. Statistics

More information

My Review of John Barban s Venus Factor (2015 Update and Bonus)

My Review of John Barban s Venus Factor (2015 Update and Bonus) My Review of John Barban s Venus Factor (2015 Update and Bonus) December 26, 2013 by Erin B. White 202 Comments (Edit) This article was originally posted at EBWEIGHTLOSS.com Venus Factor is a diet program

More information

Your Health Report Is your substance use hurting your health?

Your Health Report Is your substance use hurting your health? Test, Joan Wednesday, August 01, 2012 Joan, Your Health Report Is your substance use hurting your health? Like most women you face a lot of responsibilities and decisions every day. These include how you

More information

Been coughing for 3 weeks?

Been coughing for 3 weeks? Dr Nick Davies Been coughing for 3 weeks? Tell your doctor. go.nhs.wales/lungcancer Dr Nick Davies Let s be clear Lung cancer is one of the most common cancers in Wales. There are around 2,400 new cases

More information

CHAPTER 3 Describing Relationships

CHAPTER 3 Describing Relationships CHAPTER 3 Describing Relationships 3.1 Scatterplots and Correlation The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers Reading Quiz 3.1 True/False 1.

More information

11/02/2011 Page 1 of 16

11/02/2011 Page 1 of 16 Survey Validity % N Invalid 10 Valid 9 201 Valid surveys are those that have 4 or more of the questions answered, report no derbisol use, and indicate that the respondent was honest at least some of the

More information

The Economics of tobacco and other addictive goods Hurley, pp

The Economics of tobacco and other addictive goods Hurley, pp s of The s of tobacco and other Hurley, pp150 153. Chris Auld s 318 March 27, 2013 s of reduction in 1994. An interesting observation from Tables 1 and 3 is that the provinces of Newfoundland and British

More information

MEASURES OF GROUP CHARACTERISTICS

MEASURES OF GROUP CHARACTERISTICS MEASURES OF GROUP CHARACTERISTICS You are familiar with the idea of measuring things -- a person s height, a steak s weight, a car s value (its selling price). The purpose of any measurement is that it

More information

Psychology Research Process

Psychology Research Process Psychology Research Process Logical Processes Induction Observation/Association/Using Correlation Trying to assess, through observation of a large group/sample, what is associated with what? Examples:

More information

Math 075 Activities and Worksheets Book 2:

Math 075 Activities and Worksheets Book 2: Math 075 Activities and Worksheets Book 2: Linear Regression Name: 1 Scatterplots Intro to Correlation Represent two numerical variables on a scatterplot and informally describe how the data points are

More information

Math for Liberal Arts MAT 110: Chapter 5 Notes

Math for Liberal Arts MAT 110: Chapter 5 Notes Math for Liberal Arts MAT 110: Chapter 5 Notes Statistical Reasoning David J. Gisch Fundamentals of Statistics Two Definitions of Statistics Statistics is the science of collecting, organizing, and interpreting

More information

Name Class Date. 7. state in which the body is poisoned by alcohol and physical and mental control is reduced

Name Class Date. 7. state in which the body is poisoned by alcohol and physical and mental control is reduced Chapter 22 Vocabulary ethanol metabolism alcoholism fermentation blood alcohol concentration alcoholic depressant binge drinking recovery intoxication alcohol poisoning detoxification alcohol abuse fetal

More information

How Faithful is the Old Faithful? The Practice of Statistics, 5 th Edition 1

How Faithful is the Old Faithful? The Practice of Statistics, 5 th Edition 1 How Faithful is the Old Faithful? The Practice of Statistics, 5 th Edition 1 Who Has Been Eating My Cookies????????? Someone has been steeling the cookie I bought for your class A teacher from the highschool

More information

We admitted that we were powerless over alcohol that our lives had become unmanageable.

We admitted that we were powerless over alcohol that our lives had become unmanageable. Step One We admitted that we were powerless over alcohol that our lives had become unmanageable. Alcoholics Anonymous (AA) (2001, p. 59) Before beginning this exercise, please read Step One in Twelve Steps

More information

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality Week 9 Hour 3 Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality Stat 302 Notes. Week 9, Hour 3, Page 1 / 39 Stepwise Now that we've introduced interactions,

More information

Identify two variables. Classify them as explanatory or response and quantitative or explanatory.

Identify two variables. Classify them as explanatory or response and quantitative or explanatory. OLI Module 2 - Examining Relationships Objective Summarize and describe the distribution of a categorical variable in context. Generate and interpret several different graphical displays of the distribution

More information

STATS Relationships between variables: Correlation

STATS Relationships between variables: Correlation STATS 1060 Relationships between variables: Correlation READINGS: Chapter 7 of your text book (DeVeaux, Vellman and Bock); on-line notes for correlation; on-line practice problems for correlation NOTICE:

More information

We admitted that we were powerless over alcohol that our lives had become unmanageable. Alcoholics Anonymous (AA) (2001, p. 59)

We admitted that we were powerless over alcohol that our lives had become unmanageable. Alcoholics Anonymous (AA) (2001, p. 59) Step One 22 istockphoto.com/qingwa We admitted that we were powerless over alcohol that our lives had become unmanageable. Alcoholics Anonymous (AA) (2001, p. 59) Before beginning this exercise, please

More information

1.4 - Linear Regression and MS Excel

1.4 - Linear Regression and MS Excel 1.4 - Linear Regression and MS Excel Regression is an analytic technique for determining the relationship between a dependent variable and an independent variable. When the two variables have a linear

More information

Table of Contents. Introduction. 1. Diverse Weighing scale models. 2. What to look for while buying a weighing scale. 3. Digital scale buying tips

Table of Contents. Introduction. 1. Diverse Weighing scale models. 2. What to look for while buying a weighing scale. 3. Digital scale buying tips Table of Contents Introduction 1. Diverse Weighing scale models 2. What to look for while buying a weighing scale 3. Digital scale buying tips 4. Body fat scales 5. Is BMI the right way to monitor your

More information